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Abstract 

We study the rate of convergence of Wilkinson's shift iteration acting on 
Jacobi matrices with simple spectrum. We show that for AP-free spectra 
(i.e., simple spectra containing no arithmetic progression with 3 terms), con- 
vergence is cubic. In order 3, there exists a tridiagonal symmetric matrix Po 
which is the limit of a sequence of a Wilkinson iteration, with the additional 
property that all iterations converging to Po are strictly quadratic. Among 
tridiagonal matrices near Po, the set X of initial conditions with convergence 
to Po is rather thin: it is a union of disjoint arcs X s meeting at Po, where s 
ranges over the Cantor set of sign sequences s : N — > {1, —1}. Wilkinson's step 
takes X s to X s i , where s' is the left shift of s. Among tridiagonal matrices 
conjugate to Po, initial conditions near Po but not in X converge at a cubic 
rate. 

Keywords: Wilkinson's shift, QR algorithm, inverse variables, symbolic dynamics. 
MSC-class: 65F15; 37E30. 

1 Introduction 

In this paper, we study of the asymptotics of the shifted QR iteration with the so 
called Wilkinson's shift, acting on Jacobi matrices. More precisely ([U]|, El- 0), 
for an n x n real symmetric tridiagonal matrix T and a real number s, write, if 
possible, the unique QR factorization T — si = QR, where Q is orthogonal and R 
is upper triangular with positive diagonal. Wilkinson's shift strategy is the choice 
of s = uj(T) equal to the eigenvalue of the bottom 2x2 principal minor of T which 
is closest to the bottom entry T nn . A Wilkinson step obtains a new matrix 

W(T) = Q*TQ = RTR-\ 

From both defining formulae, W(T) is symmetric and upper Hessenberg, and thus, 
must also be a real, symmetric tridiagonal matrix, with the same spectrum as T, 
as well as the signs of the nontrivial off-diagonal elements. 

As is well known (|7j), if the iterates T% = W fc (T) of the Wilkinson step starting 
from a Jacobi matrix with simple spectrum exist for arbitrary k, then their lowest 
off-diagonal entry tend to 0: we are interested in the rate of convergence of this 
sequence. It has been conjectured ([7], 01) that the rate is cubic, i.e., for any Jacobi 
matrix T there exists a constant C such that |(Tfc + i)„, n _i | < C\(Tk) n ,n-i\ 3 ■ As we 
shall see, this is true for most matrices T but false in general. A Jacobi matrix T is 
an AP-matrix if its spectrum contains an arithmetic progression with three terms 
and is AP-free otherwise. For AP-free matrices, the conjecture is indeed true. On 
the other hand, there exist 3x3 AP-matrices for which the rate of convergence is, 
in the words of Parlett, merely quadratic. 

We give an outline of the proof. A basic ingredient are the bidiagonal coor- 
dinates, consisting of eigenvalues Xi, i = l,...,n, and additional variables 



i = 1, . . . ,n — 1, defined on large open sets of tridiagonal matrices. As proved in 
[S], the set T\ of real, symmetric, tridiagonal matrices with fixed simple spectrum 
Ai < • • • < A n is covered by open dense subsets U][, indexed by permutations ir and 
bidiagonal coordinates provide a diffeomorphism between each U\ and R™ -1 . The 
new coordinates are used to convert asymptotic issues of Wilkinson's iteration into 
local theory in an appropriate chart. 

There are two subsets of 7a in which Wilkinson's step might break down. First, 
let Z C 7a be the set of matrices T for which the shift u>(T) equals an eigenvalue of 
T: we are especially interested in T>q^ C Z, the set of matrices T with (T)„. n = A.; , 
(r)n.n-i = 0. Second, the two eigenvalues w+(T) > co-(T) of the bottom 2x2 
block T may be equally distant from the corner entry (T)„ „: let y C 7a. be the 
set of such matrices T. It is clear that the function W is smoothly defined at least 
in 7a — Z — y but using bidiagonal coordinates we shall see that the domain can 
be taken to be much larger. Indeed, for a matrix T G U\ C 7a with bidiagonal 
coordinates A* and ffi, T g Z U y, the f3? 's of W(T) are given by 



In bidiagonal coordinates, points in I?o.7r(n) H U\ satisfy /3? l _ 1 = and an inspec- 
tion of the formula above shows that extends smoothly to £>o, ?!•(«) — 3^- More 
generally, near p € T^o,ir(n) the quotient /3^_ 1 /(T) n „_ 1 is bounded above and 
below and rates of convergence are the same in both variables. We then expand 
(W 7r (/3j r , . . . , Pn-i)) n -i in a Taylor series around po. Notice that ui(T) sa A 7r („) 
near po and oddness of this function in the variable allows for 



for some C > 0, T in a neighborhood of pq. 

Points of y with 7^ are step-like discontinuities for W. The behavior of 
W near matrices p £ y C\Z is more complicated; in figure[21we show what happens 
for A = (1,2,4), a typical spectrum. The upshot from the figure is that typically 
the few points in y fl 2?o.ii f° r which the cubic estimate does not hold, are isolated 
in the sequence of iterations T k = W k (T ) and are irrelevant in the long run. More 
precisely, the argument holds for AP-free spectra, i.e., spectra which contain no 
three terms arithmetical progression. In such case, there exist positive constants C 
and K such that \(Tk+i) n ,n~i \ > C'|(7 1 fe)n,n-i| 3 holds for at most K values of k (see 
theorem 13. 411 . yielding genuine cubic convergence of Wilkinson's iteration. 

In the 3x3 AP case, instead, there exists a point po <= y H T> ,i which is kept 
fixed by W. The graphs of lo + and cj_ near po resemble the two branches of the 
cone z 2 = xz + y 2 near the origin, x and y corresponding to /3f and f3%, respectively. 
Sections of the cone by planes x — a, a ^ 0, are hyperbola with a branch passing 
through (a, 0,0) and z is therefore of the order of y 2 for small y: this quadratic 
behavior of ui implies the cubic estimates for the (n, n — 1) entry under W. For 
a — 0, however, the intersection of the cone with the plane x = a is z = ±|y|; z 
and therefore ui are of the order of \y\ and respectively: this entails a quadratic 
estimate for the (n, n— 1) entry under W. Hence, the rate of convergence is dictated 
by whether remains near po when k goes to infinity. It turns out that Tk tends 
to pq (and cubic convergence fails) if and only if To £ X , where A" is a remarkable 
set (see figure[7J). A point To € X has a sign sequence s : N — > {1, —1}, where s(k) 
indicates whether ojITu) equals lo + or The set S of sign sequences is a Cantor 




for some smooth function G, yielding the cubic estimate 



l( w ( T ))n,n-i| < C \i T )i 
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set and X is the disjoint union of graphs of Lipschitz functions f s : [—a*, a*] — > R 
with /s(0) = for all s G 5; a good reference for dynamically defined Cantor sets 
is UJ. For yo G [—a* , a*], (/s (2/0)5 2/o) S A* has sign sequence s. There is an open set 
of points Tq ^ X which are between branches of X: for sufficiently large k, their 
iterates will be either to the left or to the right of X and cubic convergence 
applies. Another application of dynamical systems to numerical spectral theory is 
the work of Batterson and Smillie (£Q) on the Rayleigh quotient iteration. 

We begin the paper collecting from the required information about bidiag- 
onal coordinates, essentially, the description of Wilkinson's iteration in terms of 
bidiagonal variables. Section 2 also contains a key ingredient: the construction of 
a function h(T) which grows along Wilkinson's iterations. The proof that h(T) 
indeed satisfies this property is indirect: it requires the interpretation of a QR step 
as the time one map of a Toda flow. The monotonicity of h then follows by a simple 
differentiation argument. 

In section 3 we prove cubic convergence of Wilkinson's shift iteration for AP- 
free matrices. In sections 4 and 5, we show that for 3x3 AP-matrices, the rate of 
convergence is usually cubic but strictly quadratic for a thin set of initial conditions. 

The authors acknowledge support from CNPq, CAPES, IM-AGIMB and Faperj. 

2 Preliminaries 

Let 7a be the set of real, symmetric, tridiagonal matrices with simple spectrum 
Ai, . . . , A„ and set A = diag(Ai, . . . , A„). For T G 7a, write T = Q*AQ for some Q G 
0(n). Write the PLU factorization of Q, i.e., Q = PLU where P is a permutation 
matrix, L is lower unipotent and U is upper triangular. This is usually possible 
for P = P^ for several permutations 1 e 5„. Indeed, since Q is invertible, there 
is a matrix P~ X Q obtained by permuting the rows of Q for which all the leading 
principal minors are invertible. For a permutation 71", define U\ to be the set of 
matrices T G T\ for which P^ 1 Q admits an LU factorization. The following lemma 
provides other descriptions of U\. Let £ be the set of diagonal matrices having the 
values 1 or —1 along the diagonal. 

Lemma 2.1 (|5j, lemma 3.1) Take T G T\ with unreduced blocks T%, .. . ,Tk of 

sizes m , . . . , Uk along the diagonal. Then T G U\ if and only if the eigenvalues of 
Ti are 

\ 7T \ 7T 

niH h««-i+l' ' ' ' ' ttiH hn<_i+nj ■ 

Alternatively, U\ is the union of all open faces in T\ adjacent to A w . Also, for 
Ee£,ifTeU£ then ETE G U\. 

For T G U\, we define the tt -normalized diagonalization as the unique factor- 
ization T — Q% t^Q-n for which the LU factorization of yields a matrix U with 
positive diagonal. Notice that = EP~ 1 Q for some E G £ . 

We now construct charts for T A , ^ : M™" 1 -> U\ and ^ : U\ -> R" _1 
with </> ff (0) = A 71 " G U\. For T G U\, consider its 7r-normalized diagonalization 
T = Q*A*Q V and Q = P^Q^ so that T = Q*AQ. Write = (which 
is possible because the leading principal minors of Q n are positive) and therefore 
P-rrLn — QRn, where i?7r = U~ x is also upper triangular. Set 

Btt = R n 1 TR 1T = L n 1 A 7T Lt T - 

Notice that the columns of L~ x are the eigenvectors of B^. From B„ = i?~ 1 Ti? 7r , B v 
is upper Hessenberg and from B^ = L~ 1 A 7r L 7r , it is lower triangular with diagonal 
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entries A", . . . , Summing up, B n is lower bidiagonal: 



/AT 

01 



V 



HI 



\ 71 

A 3 



Define i/v to be the map just constructed taking T g 1A\ to . . . , We call 

Pi, ... , Pn-i (together with A^, . . . , A£) the 7: -bidiagonal coordinates of T. 



Theorem 2.2 (0, theorem 3.4) T/ie map ip n : U\ 
with inverse <f>„ : M" _1 — > U\. 



is a diffeomorphism 



The map 4> n takes open quadrants of R™ -1 diffeomorphically to the connected 
components of 7a formed by irreducible matrices (connected components are in- 
dexed by signs of off-diagonal entries). Also, the hyperplanes j3J = in R n_1 arc 
taken diffeomorphically to the set of matrices T in UJ^ with (T) i i+1 = 0. Indeed, it 
is shown in theorem 4.5 of [5] that the quotient 0f /((T^j+i) is a smooth nonzero 
function in U\. The following lemma follows by compactness. 

Lemma 2.3 Given a compact subset K w C U\, there exist positive constants C < 
C such that, for T G K n , 

C\(T) n>n _ l \<\l%_ 1 \<C'\(T) nin _ 1 \. 

We say that a function a : 7/v — > R is even if a(ETE) — a(T) for any T € 7a and 
E G £. The following lemma translates this definition to bidiagonal coordinates. 

Lemma 2.4 ([5j, lemma 3.6) If 'E = diag(oi, . . . , a n ) G £, T £ U\, and<ip w (T) = 



A function a : 7a 
each coordinate. 



METE) = {g x o 2 01, cj n -xo n Pl-i) 
— > E is even if and only if each a o <j} v 



is even in 



As an example of bidiagonal coordinates, let A = diag(— 1,0, 1). Set 7r(l) = 3, 
7r(2) = 1, 7r(3) = 2. Matrices will be described by their 7r-bidiagonal coordinates 
x — (i\ and y — (3% • Since B n = L~ 1 A 7r L 7T , we obtain 



A w 






and writing = L^U^ we have 

1 



2r 2 2x(l + 2y 2 ) xyn 
Q-k = — I -xr 2 2(2 + x 2 y 2 ) -2yr x 
Tir2 » -2xyr 2 y(A-x 2 ) 2r x 



where T\ = y/A + x 2 + Ax 2 y 2 and r 2 = a/4 + 4y 2 + x 2 y 2 . From T = Q^A^Q^, 



'(4 - x 2 )r 



T = 



,2„2 



2xr| 



2xr\ 





-4(4 - a; 2 - 4x 2 y 4 + xV) 2yrf 

2yrf y 2 (x 2 — 4)r\ 
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This example will be revisited in sections 4 and 5. 

For an invertible real matrix M, write the unique QR factorization 

M = Q(M)R(M), 

where Q(M) is orthogonal and R(M) is upper triangular with positive diagonal. For 
some function / taking nonzero values on the spectrum of a tridiagonal symmetric 
matrix To, the QR step induced by f is the map 

F(T) = Q(/(T))*TQ(/(T)) = R(/(T)) TR(/(T)) _1 . 

From both equalities, we learn that F(T) is also tridiagonal, with same signs and 
zeroes along the off-diagonal entries than T. The standard QR step corresponds to 
f(x) = x and taking a shift s means taking f(x) = x — s. The map F admits a 
simple description in terms of bidiagonal coordinates. 

Proposition 2.5 ([5 , proposition 4.2) For f taking nonzero values on the spec- 
trum of T, 



(Foi)^,...,^)^, 



PI,- 



f(K(n)) 



f{K(n-l)) 



We now present some technical results concerning the dynamics QR type iter- 
ations which will be needed in the proof of theorem 13.41 The argument is rather 
indirect and seems to require the language of Toda flows. For a square matrix M, 
let S = II a M be the skew symmetric matrix for which (M)ij = (S)ij for i > j. The 
following result, which follows by direct computation, relates Toda flows and QR 
iterations. 

Proposition 2.6 ([8 ) Let A be a diagonal matrix with simple spectrum, g : R — * R 
be a smooth function, X g be the vector field X g (T) — [T, H a g(T)] and T : R — > 7a 
be a path satisfying j- f T = X g (T), T(0) = T . Then 

T(t) = Q(exp(t. 9 (T )))*T Q(exp(t. 9 (T ))) =R(exp(t. g (T )))r R(exp(^(r )))- 1 , 

or, in other words, T(t) = F(Tq) where f(x) = exp(t g(x)). 

The dynamics of Toda vector fields is rather simple: they are essentially gra- 
dients of Morse functions. The following result is known for g(x) = x (0, 0) 

El). 

Theorem 2.7 Let g : R — > R satisfy g(Xi) ^ g(Xj) for distinct eigenvalues Aj, Xj of 
A. Let X g be the Toda vector field [T,H a g(T)] onT\. Let M = diag(/ii,^2, ■ ■ ■ , Mn)j 
A*i > A*2 > ' ' ' > Mn/ let h>M,g '■ 7a R be the smooth function hu,g(T) — 
tr(Mg(T)). Then the directional derivative X g hu,g is strictly positive except at 
diagonal matrices. 

Proof: Define a path T : R — > 7a satisfying 4rT = X g as above. We first claim 
that -3T<7(T) = [g(T),IL a g(T)] for any smooth function g. Indeed, since Toda flows 
preserve spectra, g may be replaced by a polynomial p which coincides with g on 
the spectrum of A. By linearity, it suffices to consider Pk(x) = x k , which is handled 
by induction on k: 

|p fe+1 (T) = l(T Pk (T)) = T (^(T)) + (| T ) Pk (T) 
= T[T fc ,n Q /(T)] + [T,n Q /(T)]T fc 

= T fe+1 (n a /(T)) - T(II a /(T))T fc + T(n a /(T))T fc - (n a /(T))T fc+1 

= b fc+1 (T),n a /(T)]. 
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Take g = g and compute the derivative of fiM,g along the path T: 
= tr M \j t 9(T^j = tr(Af[ 5 (T),II a3 (T)]) 

= 2 (^-^)(5( T ))l- 

l<i<j<n 

Since g(T) has simple spectrum, it is diagonal only if T also is and we are done. 

■ 

Corollary 2.8 Let f : M — > K be a function satisfying |/(Ai)| ^ |/(Aj)| ^ for 
i =/= j and let F : 7a — > 7a 6e i/ie Qi? step induced by f . Let K, C 7a be a compact 
set containing no diagonal matrices. Then there exists K > such that, for all 

TeT A , 

\{k £ N\F k (T) e K}\ < K. 

Proof: Consider g(x) — log |/(a;)|, X g , T, M and /iM,g as in the previous theorem. 
Since K, is compact and avoids diagonal matrices, X g hM, g > e > on K,. From 
proposition 12. 61 F k (To) — T(k) and we are done. ■ 

It is not true that, given K,, there exists if 2 such that, for all To £ 7a, F k (To) ^ JC 
for all k > Ki. For instance, the exit time from a small neighborhood of a diagonal 
matrix T is not uniformly bounded; another situation is given in figure [3] below. 



3 Wilkinson's shift and AP-free matrices 

We consider iterations of shifted QR steps, induced by f s {x) = x — s. The function 
F s : 7a — > 7a is defined whenever s is not an eigenvalue. Frequently, the shift s is 
taken to depend on T. We consider the asymptotic properties of a special choice of 
shift, which defines Wilkinson's step ( 10 ). For T S T\, let lu + {T) > uo-{T) be the 
two eigenvalues of the bottom 2x2 block of T and set lo(T) to be the one nearest to 
the entry (T)„.„. This defines continuous maps w+, ui- : 7a — > K which are smooth 
in 7a.— yo, where J^o C 7a is the set of matrices T for which (T)„ in = (T) n _i jn _i and 
(T) n ,„_! = 0. Indeed, T £ y and only if u+(T) = w_(T). Also, w : T A - ^ -> K 
is smooth where T G ^ C 7a if (T) n<n = (T) n _i )TS _x. Indeed, T S ^ if and only if 
(T) n .n is equally distant from both eigenvalues of its bottom 2x2 block, which is 
the only way smoothness could fail. Also, u>± and u> are even in the sense of lemma 

Lemma 3.1 The functions u>± : 7a — > K are Lipschitz. 

Proof: The function taking T to its bottom 2x2 block is clearly smooth. The 
function taking a 2 x 2 symmetric matrix A to its larger (resp. smaller) eigenvalue 
is Lipschitz in compact sets. The lemma follows from composition. ■ 

Let Z k = {T e T A ; w(T) = A fc }, Z = [j k Z k , Z k = Z - Z k . Notice that if 
(T) n> „_i = then ui(T) = (T) n:1l = X k for some k and therefore T £ Z k . Also, if 
(T)„_i,„_2 = then again cu(T) = X k for some k. Define W:7a — y — Z^7Xby 
W(T) = F u (t)(T) for f s (x) = x - s. Notice that W is an odd function. 

Figuren snows <7a for A = diag(l, 2, 4) and A = diag(— 1, 0, 1). Labels indicate 
the diagonal entries and vertices are diagonal matrices. The set y, on which W 
is not defined, degenerates in the second example. Vertices are fixed points and 



G 



edges are invariant under W. A simple arrow indicates the order of the points 
T, W(T), W 2 (T), . . . along the edge. For points T on an arc labeled with a double 
arrow, W(T) is a vertex. Arcs marked with a transversal segment consist of fixed 
points of W. 




Figure 1: The phase space of Wilkinson's step for n = 3. 

We apply proposition 12 . 51 to write down Wilkinson's step in bidiagonal coordi- 
nates. Define by 



w^,.,W 



\r(2) 



K(i) 



Pi, 



^Tr(ri-l) 



where lo = w(^ 7r (/3| r , . . . , Thus, the natural domain for W w is R™ 1 — 

</>~ 1 (y n U][) — 0^T 1 (Z 7r („) n U][), where W w is a smooth function, indicating that 
points in 0~ 1 (Z 7r( -„)) are removable singularities and that, despite absolute values 
in the formula, is smooth at such points. Notice that is odd, since u> is 
even in each variable /3| r . Also, points in 0~ 1 (Z 7r („)) are of the form (3%-i = or, 
equivalently, (T)„ iTl _i = 0. 

Points of y with /3^ L _ 1 ^ are step-like discontinuities for W. The behavior of 
W near matrices po G yC\Z is more complicated; in figure|5]we show what happens 
for A = (1, 2, 4), a typical 3x3 AP-free spectrum. Close to 



Po 




tt(1) = 1, tt(2) = 3, tt(3) - 2, ff = 3V2, & = 0, 



the set y divides the plane into two sides T> + and P_ , where we take ui+ and uj- , 
respectively, in the definition of W. From each side T>±, the function W can be 
continuously extended to y but the two values thus obtained are quite different 
except at p$. 

The figure is drawn with a vertical stretching factor of 200. The thick vertical 
curve (which looks like a straight line due to stretching) is the set y. To the right, 
the curve BF (with a cusp at W(po)) is the image of the arc in y from /3J = 0.1 
to /?2 = —0.1 under W with the choice w = to the left, the curve CG is the 
image of the same arc, now with u> = uj + . The dotted lines AB and CD are the 
image under W of the horizontal line (3^ =0.1: notice the jump discontinuity at y 
from B to C. Oddness of W in the coordinate /3J explains the mirror symmetry in 
the horizontal axis. 

The study of the asymptotic behavior of the iterations of Wilkinson's step be- 
comes significantly simpler by taking into account the following result. 

Theorem 3.2 ([7 , p. 152) If T k = W k (T ) then lim fc ^ co (T fc ) n ,„_ 1 = 0. 

From this result, deflation is always possible in numerical implementations of 
Wilkinson's iteration: T will be truncated so as to split into blocks of size n—1 and 
1. We will investigate the rate of convergence of {Tk) n ,n-i to 0. 
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Figure 2: Image of y under for A = diag(l, 2, 4) (stretched vertically). 

Proposition 3.3 Let T>(e) = {T G T\ | \{T) n .n-i\ < e}- Given a spectrum Ai < 
• ■ • < A n , there exists e q > and C q > such that W(2?(e)) C L)(e) for any e < e q 
and\(W(T)) rhn ^\<C q \(T) nin ^\ 2 forT eV(e q ). _ 

Furthermore, for any subset T> c C 2?(e g ) satisfying T> c PI 3^ = i/iere exists 
C c > sucft i/iai |(W(T)) n , n _i| < C C |(T)„ ! „_ 1 | 3 /or T e £> c . 

Proof: Define smaller open sets C AT^ C i/J where each is compact such 
that the open sets V 11 still cover T\. Let P c Ta be the set of matrices T with 
(T)n.n-i — 0. The set T>q is a compact submanifold of codimension 1. The set T>q 
is not contained in any V n but is clearly covered by them and we can then write the 
7r-bidiagonal coordinates /Jf ,• •■ ,/3^_i for Te V. Lemma l2~3l allows us to identify 
in V 7r rates of decay for (r) ni „-i and fl^-i 4 

Let ei < | Ai — Aj|/2 for all i ^ j- We can take eo > such that T £ £>(eo) implies 
|(r)„ )n — Ai| < ei for some (unique) i (i depends on T). Call T> l (e ) C £>(eo) the 
set of such T. Assume without loss that £> l (eo) is covered by V" for permutations 
7r satisfying 7r(rt) = z. In 2? l (eo), W in 7r-bidiagonal coordinates is given by 

Am 



V(2) 



A; 



7t(n— 1) 



— a; 



In each V 71 ' PI X> l (eo), w is a continuous function taking the value Ai when = 0. 
Since w± are Lipschitz functions of T, | A» — w\ < L\/3*-i\ for some L > 0, implying 
from the formula for W ff the quadratic rate of decay for /3^_ 1 . By taking eo even 
smaller, we can assume that |Ai — w\ < t\ in this set so that the quotients 

A,r(2) — W A T („_i) - w 



— CJ ' A,r( n _2) - w 

have absolute value bounded and bounded away from zero and the first claim follows 
by compactness. 

The Taylor expansion of the last coordinate of centered at points T £ 
(V n n T>o) — yo with respect to the variable P^-i ls °f the form 

The function W„. is odd, so ao = 02 = 0. The factor (A n ( n ) — w)/(A jr ( n _x) — w) 
equals if P^-i = an d therefore ax = 0. In other words, 

for some real analytic function G. Again by compactness, we are done. ■ 
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A matrix T is an AP-matrix if its spectrum contains an arithmetic progression 
with 3 terms, i.e., some eigenvalue is the average of two others; otherwise, T is 
AP-free. We also refer to AP-free spectra, Jacobi cells, isospectral manifolds and 
so on, with the obvious meanings. The left hexagon in figure ^ is AP-free and the 
right hexagon is not. 

Theorem 3.4 For AP-free tridiagonal matrices, Wilkinson's iteration has cubic 
convergence. More precisely, given an AP-free spectrum Ai < • • • < A n , there exist 
C > and K > such that, for any To G 7a, 

(a) ifk > K then |(T fc+1 ) n ,„_ 1 | < 1/C and |(T fe+1 ) n , Ti _ 1 | < C|(T fe ) n ,„-i| 2 ; 

(b) the set of positive integers k for which \(Tk+i) n ,n-i\ > C|(Tfc) n! „_i| 3 has at 
most K elements. 

Proof: We keep the notation of the proof of proposition ^. 31 Item (a) follows from 
theorem 13.21 proposition 13.31 and compactness. Indeed, for any given e > there 
exists K\ such that for any To G 7a and for any k > K\ we have Tfc G T>(e). For 
item (b), we need to prove that, given an open T> c d T\ containing the diagonal 
matrices, there exists Ki such that, for any To G 7a, the set of positive integers 
k for which Tfc ^ T> c has at most K-i elements. Notice that 3^o is removed from 
diagonal matrices and therefore such a set T> c exists. 

As a warm-up case, let T>o,i C Zi be the set of matrices T G 7a for which 
{T) n ,n = A^ (T), v „_i = 0. Clearly, T>o,i are the n connected components of T>q and 
the set T>o,i is diffeomorphic to T\ t , where 



that is, A without A^. Points in X>o,i are removable singularities of W and there W 
is a QR step on 7/v ; (the top (n — 1) x (n — 1) block) with f(x) = a; — A*. Apply 
corollarv l2.8l to conclude this special case: simplicity of A ensures that Ai is not an 
eigenvalue of the top (n— 1) x (n — 1) block and the fact that A is AP-free ensures 
that \lambdaj — Aj| =/= \Xf — Aj| for distinct eigenvalues Xj and Ay of said block. In 
general, we need to extend the reasoning in corollarv l2.8l to a neighborhood of T>q^. 

Let b : 7a — > K be the smooth function defined by b(T) = (T) n .n-\- From 
lemma l2~3l b is transversal to T>o; let Z be a smooth vector field in 7a such that the 
directional derivative Zb satisfies Zb — 1 in V(ez) for some ez > 0, ez < e q . The 
vector field Z can be integrated to yield a diffeomorphism £ : T\) x (— ez,ez) — * 
T>(ez). Let 'Di(ez) = C(^o,< x ( — e.z,ez)): we have a diffeomorphism £ : 7^ x 



(-ez, e z ) -» Di(ez). Set W, : T Aj x (-e z , e z ) -» T Aj x (-ez, ez), W, = (C) _1 oWoC 



and, for to G 7a ; x (— ez, ez), tfc is defined recursively by t^+i = Wj(tfe). 

Let g(x) = log(|x — Ai|) and M — diag(n — 1, n — 2, . . . , 2, 1). Let X g tangent 
to Ta; and hu,g ■ T\ t — ► R be as in theorem 12.71 Extend /ijw,g to /ijw.o : 7a s X 
(— ez, ez) — » R by ignoring the second coordinate. Let T : R — * 7X; with ^T = X g : 
from proposition ^ . 61 we have Wi(T(0), 0) = (T(l), 0). Therefore, from theorem |2.7l 
/iM, 9 (Wi(to)) > hu,g(to) for any to = (To, 0) G 7^ x {0}, To not a diagonal matrix. 
Set 6 : 7a s x (-ez,e z ) -> R, <5(t ) = ^/,g(W 4 (t )) - h M , g (t ); by compactness of 
the complement of T> c , there exists e > such that 6 (to) > e if to G (7^ x {0}) — 
(£ _1 (X> C )). Since (5 is continuous in 7^ x {0}, there exists e' > 0, e' < ez, such 
that t G (7a ; x (-e',e')) - C'H^c) implies 6(t Q ) > e/2. Clearly, for any to in 



and therefore there are at most 2(max/iA/. g — mhxhu,g)/^ values of k for which 



Ai = diag(Ai, . . . , Aj_i, A i+ i, . . . A„), 



r K x (-e',e'), 




tfc ( 1 (T' C ) and we are done. 
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Figure 3: We may have Tk € y for large values of k. 



On the other hand, it is not true that given an AP-free spectrum A there exist 
C > and K such that |(Tfe_|_i) ni „_i| < C\(Tk) n ,n-i\ 3 for all k > K. A counterex- 
ample is indicated in figure |3 the orbit may spend an arbitrarily large number of 
steps near the saddle point S and we may therefore have Tk £ y for arbitrarily 
large k. 



4 Wilkinson's step for 3x3 AP-matrices 

In this and the following sections we prove that for any spectrum of the form 
{a — b, a, a + b} there exist matrices To for which 




lim Tfc = 

k — >oo 

and that the convergence of „_i towards is still quadratic but not cubic. 

For a = 0, b = 1, this limit is the point labeled (0, 0, 0) at the bottom of the right 
hexagon in figure ^ 

Up to normalizations, a 3 x 3 AP-matrix is isospectral to A = diag(— 1, 0, 1). 
We use the 7r-bidiagonal coordinates for matrices in U\ computed in section 2 for 
an appropriate permutation n. The bottom 2x2 block of T € U\ is 



T = 



1 /-4(.t 2 -4)(aV - 1) 2yr\ 



2„2 I 0, (T .3 



2yr{ y (x — 4)r\ 



where r\ = \J A + x 2 + Ax 2 y 2 , T2 — \/4 + 4y 2 + x 2 y 2 . Let uj + > be the (real) 
eigenvalues of T; by interlacing, —1 < oj_ < < lj + < 1, with equality only 
when x = or y = 0. The discriminant of the characteristic polynomial of T is 
A = {{x + 2) 2 + 8x 2 y 2 )((x — 2) 2 + 8x 2 y 2 ) > which is zero exactly at the points 
ipo, Po = (2, 0). These points correspond to the matrices ±Po, 

/0 1 0^ 
Po = 1 

\0 0y 

and the origin to the matrix A^. 

The eigenvalue closest to the bottom element (T) 2 ,2 = (^)3,3 is ^+ if and only 
if (T)3 i 3 > (uj + + U-)/2. A straightforward computation yields 

UJ++UJ- _ (2 - x)(2 + x)(4 - 4y 2 - x 2 y 2 - 8x 2 y 4 ) 
(T)a,3 — ^| 

which indicates that the xy-plane is divided by y into regions, the choice between uj + 
and U- being as in figure 0] (where only the region x > is shown). The hexagon 
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of matrices on the right of figure ^ is the image under of the first quadrant 
(of 7r-bidiagonal coordinates) in figure the reader may check, for instance, that 
0^(2,0) = P and that 




Figure 4: The choice of ui. 

We consider the region 

R = {{x, y)\x > 0, 4 - Ay 2 - x 2 y 2 - 8x 2 y 4 > 0} 

and the subsets R+ = Jin ((0,2] x K), i?_ = Rn ([2,+oo) x M). For < a < 1/10, 
define the wedge of height a to be 

V a = {(x,y)\\y\<a,\y\ >\x-2\/10}. 

Lemma 4.1 The functions lu± are smooth in {x,y) £ R except at the point po, 
where they have a cone-like behavior: 

(x-2)± J(x - 2) 2 + 32y 2 „ n9 „ 
w± = — ^ — + 0((a: - 2) 2 + 32y 2 ). 

For (a;,y) S R+ (resp. R-), < uj + < 2\y\ (resp. — 2\y\ < w_ < 0J. There exists a 
positive constant C such that \u>± \ > C\y\ for (x,y) G V a . 

Proof: We have 

-4 + x 2 ±y/A 



UJ± 



2r\ 



The displayed estimate for uj± follows directly from 

A 

(x, I/ H(2,o) 16((a: - 2)2 + 32y2) ~ ' 

The signs of u± follow from interlacing and the other estimates are now easy. 



Lemma 4.2 The partial derivatives (lo±) x and {oj±) y are uniformly bounded in 
R — {po}- For all (x,y) G R± — {po} we have (uj±) x > 0, with equality exactly 
when y = 0. Also, (lo±) x > 1/120 in any wedge V a . Furthermore, for y =/= 0, 
±y(v±) y > 0. 
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Proof: A straightforward computation yields 

8a; 



{u}±)x = ?7E + 2y2)VK ) ± ( ~ 4 + x2 + 8y2 + 6x2y2 + 16x2y4) ^ ' 

(w±) v = ^fjL (((4 - x 2 )j£) ± (16 + 24a; 2 + x* + 32x 2 y 2 + 8aV)) • 



Also, 

((1 + 2y 2 )VA) 2 - (-4 + x 2 + 8y 2 + 6x 2 y 2 + 16x 2 y i f = 8y 2 rf > 



whence 



(1 + 2y 2 )VA > I -4 + x 2 + 8y 2 + 6x 2 y 2 + 16x 2 y 4} 



where the equality holds if and only if y = 0. In order to prove the estimate in V, 
write 



8y 2 r\ 



4^ (((1 + 2y 2 ) v r K S j T (-4 + x 2 + 8y 2 + 6x 2 y 2 + 16a; 2 ?/ 4 ) 

S2x y 2 
~ {l + 2y 2 ){{x + 2) 2 + 8x 2 y 2 ) (x-2) 2 + 8x 2 y 2 1 ' 

Since 

((4- x 2 )y/A) 2 - (16 + 24a; 2 + a; 4 + 32a; V + 8x i y 2 ) 2 = -tea?r\ < 2 



< 16 + 24a; 2 + a; 4 + 32x 2 y 2 + 8x 4 y 2 , 



we have 

(4-a-VA 



which yields the sign of (to±) y - 

Boundcdness near x = 00 follows from the rates in x since y is bounded. For 
{x,y) near pg, expand the formula for (lo±) x as a sum of two terms. The first term 
is bounded since vA simplifies; from the computations above, the absolute value 
of the second term is no larger. Since y/y/~K is bounded near po, so is (u)±) y . I 

In bidiagonal coordinates, Wilkinson's step is given by 
W (x,y)=( 1 -±^x,^Ly 

Since from interlacing uj± does not change sign, the restrictions W + : R + — > R C M 2 
and W : R- — > R C M 2 , are continuous in their respective domains and smooth 
except at po- The restrictions of W to the the left and to the right of the vertical 
line r given by x — 2 coincide with W + and W_ and the restrictions of these two 
functions to r yield different values. Figure shows the images of W + and W_, 
clearly contained in R. As we shall see, both W + and W_ are homeomorphisms 
onto their respective images. 

The line r is taken by W + (resp. W_) to an arc contained in i?_ (resp. R+), 
with a cusp at pq. The horizontal axis is a common tangent to the four smooth 
subarcs in the images of r. A straightforward computation verifies that the preimage 
of the vertical line r under W consists of the two smooth arcs 



/ , (x-2)^a;(a; 2 + 2a; + 4) \ 
[ Xi± 4^ J' 

shown in figures and which are tangent to the lines y = ±^-(x — 2) 
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Figure 5: W±(i?±) (shaded) and W ± 1 (r) (thick); in scale. 



Proposition 4.3 The functions W± are orientation preserving homeomorphisms 
onto their respective images. 

Proof: The Jacobian matrix of W ff is 

/ 2(lu ± ) x l + (w±) 2(w±)„ 



DW ± (x,y) 



(1-( W± )) 2 l-(co ± ) (l-(^±)) 2 

v a+^xr (i+(^±)) 2y i+(w±)y 

with determinant given by 



detDW ± (x,y)=±—^ ( 2 ^ xUJ±x + ( u± ) y y + „ ± (i + u± 

It follows from lemmas [4.11 and FOl that each term in the sum between parenthesis 
have the same sign and det DW± (x, y) > if y ^ 0. 

Points in the horizontal axis are fixed points of W±. Figure [5] indicates that the 
boundary of the domains are taken to simple closed curves, which in turn implies 
the result, from standard degree theory. A more rigorous and rather lengthy proof 
is possible using estimates and a little topology, but will be omitted. ■ 

Notice that W, reverses orientations for [x, y) G R- but W_ preserves orien- 
tations. 



5 Quadratic convergence for Wilkinson's iteration 

Theorem 5.1 There exists an open neighborhood A C 7a of P and a closed set 
X C A of zero measure, invariant under W, on which the iteration converges 
quadratically to Pq . The part of X with positive y coordinate is homeomorphic to 
the Cartesian product of a Cantor set and an open interval. 

Numerical evidence indicates that we can take A = U\ . Figure [7| shows X in 
7r-bidiagonal coordinates: X and its mirror image at the y axis map the set of all 
matrices in 7a for which Wilkinson's iteration converges quadratically. Since the 
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r 




Figure 6: Images and preimages of the line x = 2; not in scale. 

Cantor set is extremely thin, the fine structure of the set X is invisible in the figure; 
the curves W± (r) fit inside the largest gaps of X in each quadrant. As we shall 
see, an equivalent characterization of X is wedge invariance: X is the set of points 
whose forward orbit under W is eventually contained in V a ■ Propositions 15.41 and 
15.51 below imply the theorem and provide additional, more technical information 
about X. 

In a self-evident notation, we speak of the upper and lower half-wedges and of 
the NE, NW, SE and SW faces of a wedge V a . Given z S R, set z k+i = W(z k ); 
this is well defined unless z k € r. The sequence (z k ) keN is the W-orbit of z . 




Figure 7: The set X near p ; in scale 

Lemma 5.2 For sufficiently small a > 0, if(x,y) ^ V a , \y\ < a, thenW(x,y) V a . 
Furthermore, a W -orbit tends to po if and only if it is eventually contained in a 
wedge. 

Proof: Consider a short segment in the upper half plane starting at po with argu- 
ment 9. It is easy to see that for 9 > tt — arctan(v / 6/8), the image under W + of 
this segment is a curve tangent to the horizontal axis at po and to the left of the 
vertical line r. Similarly, if 9 < tt — arctan(v / 6/8), the curve is to the right of r. An 
example of this is W + (r), shown in figure We remind the reader that — V&/8 
is the slope of W^ 1 (r) at po. Since —1/6 < — V6/8 < 0, the images of the NW 
face and of the line r are to the left and right, respectively, of the wedge. A similar 
remark holds for W_ and the NE face. 

Near the horizontal axis, |w|/(l + ui) can be assumed to be smaller than 1 and 
therefore the absolute value of the second coordinate of z k is decreasing. The slope 
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of the line joining zq = (x, y) and Z\ is 



V (1 ~^±)(1+^±T^±) 
uj± — 2x(1+uj±) 

Since \u>±\ < 2\y\ (lemma |4.2[) and the second fraction tends to 1/4 when (x,y) 
tends to po, the slope can be assumed to have absolute value greater than 1/9, i.e., 
to be steeper than the faces of the wedge. Thus, V a is further from Zk+i than from 
z k . ■ 

The set X can now be defined either as the set of points whose orbit is eventually 
contained in a wedge or as the set of points z for which lim^oo W fc (z) = po. 
An L-flat arc in V a is the graph r C V a of a L-Lipschitz function / : / — > M. 

Lemma 5.3 There exist a wedge V a * and a positive constant L* < 1/6 with the fol- 
lowing properties. Suppose Tq is an L* -flat arc in V a * , with left endpoint belonging 
to the NW face ofV a * and right endpoint in the vertical line r. Then W+(rJ) con- 
tains T\, also an L* -flat arc in V a * with left (resp. right) endpoint in the NW (resp. 
NE) face of V a * ■ Moreover, such arcs are uniformly pushed towards the horizontal 
axis: 

max y < 1/4 min y. 

(*,3/)eri (x,v)er+ 

Furthermore, W + stretches the horizontal coordinate. More precisely, let the end- 
points of Ti be W + (x± , y±) and for x G [x+, X—], let 4>{x) be the first coordinate of 
W + (x, y) where (x.y) G T^; then 4>'(x) > 1 for all x. 

An analogous statement holds for the action of W_ on an L* -flat arc Tq with 
endpoints now belonging to r and the NE face of the wedge. 

Symmetry with respect to the horizontal axis implies similar results for arcs in 
the lower half wedge. 

Notice that on smaller wedges, the lemma still holds for the same Lipschitz 
constant but given a wedge, the Lipschitz constant cannot be taken arbitrarily 
small. 

Proof: We prove the statements concerning the action of W + on the upper half 
wedge, the others being similar. 

In order to control the slope of images of L-flat arcs, we proceed to prove the 
following claim. Given L > 0, there exists a > such that if (x, y) £ V a then: 

1. the eigenvalues Ao and Ai of DW+(x, y) satisfy |Ao| < 1/4, Ai > 1/2; 

2. for the associated eigenvectors Vi, |cotargt>o| < 1/L and |tanargui| < L. 

Indeed, from lemma l4~2l and the formula for D W + in the proof of proposition ^. 31 
the entries in the second row of DW + tend to zero when (x, y) tends to po, the (1, 2) 
entry is bounded and the (1, 1) entry is larger than 3/4. In a suggestive notation, 

£ W+ = fa ^ 

has eigenvalues 

(ai + eg) ± y/ (ai - e 2 ) 2 + 4a 2 £i 
2 

from which the estimates for Ao and Ai follow. The eigenvectors can be written 
as vo = Co(a2,Ao — ai) and v\ — Ci(e2 — Ai,— ei), from which estimates for the 
arguments also follow, completing the proof of the claim. 
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Assume without loss of generality that L* is so small that 

max y < 2 min y 

(x, y )er (x,y)er 

for any L*-flat arc T in V a *. Assume also that lo + < 1/8 for all (x,y) <G V a * . From 

\ 1 — 0J+ 1 + u+ J 

we have y\ < y/8 proving 

max y < 1/4 min y. 

The claim implies that for all (x,y) € V a , v\ is in the east sector | argt>i| < arctanL 
and vq is in the north sector arctanL < arguo < 7r — arctanL. Thus, the east sector 
is taken by Z?W+(x, y) to a subset of itself. Setting L = L* and a* — a, this in turn 
implies that the image under W + of the arc Tq in the statement of the lemma is an 
arc for which the Lipschitz constant L* still holds. As seen in the beginning of the 
proof, the endpoints of W + (r^) are to the left and right of V a * . The intersection 

of w + (r+) with v a * is r x . 

Write 

1 + 2(u)±) x x — LU± 

(l-- ± ) 2 • 

Take V a * so that (1 — o>±) 2 < 1.01, uj± < 0.01, 2(uj±) x x > 0.1 from which we learn 
that a\ > 1.05, completing the proof. ■ 

A sign sequence is a function s : N — * {±1} or (so, si, S2, ■ ■ ■)■ The distance 
between two distinct sign sequences s and s is 3~ fc , where k is the smallest number 
for which Sk ^ Sk. There is a natural bi-Lipschitz homeomorphism between the 
set S of all sign sequences and the middle third Cantor set /C3 C [0, 1]: take s to 
S/c>o(l + s fc)/3* +1 - More generally, a closed subset JC of a Lipschitz graph T is a 
Cantor set if it has empty interior (in the induced topology in T) and no isolated 
points. As is well known, a subset of a Lipschitz graph T is a Cantor set if and only 
if it is homeomorphic to S. 

Given Zq € R, the ZQ-sign sequence s z ° is such that = +1 iff zj, e R + (where 
z k+1 =W(z k )). 

Proposition 5.4 Let L* and a* be as in lemma 1X31 Let Tq be an L* -flat arc in 
V Q * with endpoints in the NW and NE faces. The set X n Tq is a Cantor set and 
the map taking zq to the ZQ-sign sequence is a bijection from X PI Tq to S. 

The set X is the disjoint union of graphs of Lipschitz functions f s : [0,a*] — * K 
(one for each sign sequence s) taking yQ to the x coordinate of the unique point in 
the intersection of X with the arc y = yo with sign sequence s. 

Numerical analysis gives, for example, 

/(+,+,+,+,...) (1/10) fa 1.70831765759310579903646760761255776476753484977976. 

Proof: Let Tq = Tq n R±. From lemma I5~31 the image of Tq under W± contains 
an L*-flat arc in V a * with endpoints in the NW and NE faces. For a sign sequence 
s = (so, Si, S2, . . .), let Ti be such an arc contained in W So (F^ ), and, more generally, 
Tk+i be an arc contained in W Sfe (r^ fc ). Define intervals I k = [a k ,bk] C r (see figure 
Ifor s =(+,-,-,...)) by 

i = , w. • ■ ■ w sl w so i k - w k i k = v s k " . 
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Figure 8: Some curves I\; schematic. 



Again from lemma HOI |2fc+i| < |-ffc|/4 and the intersection of the nested family 
of intervals Hklk consists of the unique point in X D Tq with sign sequence s. Thus 
the map from S to X n To taking s to the point with sign sequence s is injective 
and continuous, whence X n Tq is a Cantor set. 

Now fix a sign sequence s. Since the arc y = yo is L*-flat, the function f s is 
well defined. We show that f s is Lipschitz with constant 1/L*. Indeed, assume by 
contradiction that j/i and satisfy 

\f s {V\) ~ > ^-\yi - 2/2 1; 

the line through the points (f s (yi),yi) and (/ s (?/2), 2/2) has slope smaller than L* 
and is therefore L*-flat. Thus, there are two points on the intersection of X with 
an L*-flat arc with the same sign sequence, a contradiction. ■ 

The set X is rather thin, with Hausdorff dimension (at least in a neighborhood 
of po) equal to 1; we do not present a proof of this fact. Numerics suggests that X 
is a union of smooth curves X Sl se5, parametrized by (f s (y),y)- The curve X s is 
taken by W to X s r, where s' is the left shift of s: s' — (s(l), s(2), s(3), . . .). 

Proposition 5.5 For zq = (xo,yo) € X, there exists positive constants c,C such 
that, for z n = (x n ,y n ), c\y n \ 2 < \y n +i\ < C\y n \ 2 , i.e., the convergence of z n to p a 
is strictly quadratic. On the other hand, for zq S R — X the convergence of z n is 
strictly cubic. 

Proof: Recall that y n +i = i^jUn- From lemma ITTl there exist positive constants 
C\,C\ such that ci|y| < \u\ < C\\y\ for any point (x,y) £ V a . Also, we may assume 
that 1/2<1 + cj<2 and therefore 

^M 2 < T ^M<2a 1 | y | 2 

2 l + OJ 

and the first claim follows. 

If the limit point is some p — (x, 0), x ^= 0, then there exists positive constants 
c\,C\ such that, in a neighborhood of p, Ci\y\ 2 < \ui\ < Ci\y\ 2 . This follows from 
the fact that uj is smooth and even near p we may write a Taylor expansion as in 
theorem !3.4l The second claim now follows easily. ■ 
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